Textractor: A Framework for Extracting Relevant Domain Concepts from Irregular Corporate Textual Datasets
نویسندگان
چکیده
Various information extraction (IE) systems for corporate usage exist. However, none of them target the product development and/or customer service domain, despite significant application potentials and benefits. This domain also poses new scientific challenges, such as the lack of external knowledge resources, and irregularities like ungrammatical constructs in textual data, which compromise successful information extraction. To address these issues, we describe the development of Textractor; an application for accurately extracting relevant concepts from irregular textual narratives in datasets of product development and/or customer service organizations. The extracted information can subsequently be fed to a host of business intelligence activities. We present novel algorithms, combining both statistical and linguistic approaches, for the accurate discovery of relevant domain concepts from highly irregular/ungrammatical texts. Evaluations on real-life corporate data revealed that Textractor extracts domain concepts, realized as single or multi-word terms in ungrammatical texts, with high precision.
منابع مشابه
Towards A Semantic Tagger for Analysing Contents of Chinese Corporate Reports
In this paper, we report on an experiment in which we explore the feasibility of applying a semantic tagger for analysing the textual contents of Chinese corporate reports, focusing on the contents of corporate strategy. In recent years, Natural Language Processing (NLP) research has been giving increasing attention to automatic analysis of the textual contents of corporate reports using NLP ap...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملExtracting Meronymy Relationships from Domain-Specific, Textual Corporate Databases
Various techniques for learning meronymy relationships from opendomain corpora exist. However, extracting meronymy relationships from domain-specific, textual corporate databases has been overlooked, despite numerous application opportunities particularly in domains like product development and/or customer service. These domains also pose new scientific challenges, such as the absence of elabor...
متن کاملFrom Glossaries to Ontologies: Extracting Semantic Structure from Textual Definitions
Learning ontologies requires the acquisition of relevant domain concepts and taxonomic, as well as non-taxonomic, relations. In this chapter, we present a methodology for automatic ontology enrichment and document annotation with concepts and relations of an existing domain core ontology. Natural language definitions from available glossaries in a given domain are processed and regular expressi...
متن کاملChoosing appropriate theories for understanding hospital reporting of adverse drug events, a theoretical domains framework approach
Adverse drug events (ADEs) may cause serious injuries including death. Spontaneous reporting of ADEs plays a great role in detection and prevention of them, however, underreporting always exists. Although several interventions have been utilized to solve this problem, they are mainly based on experience and the rationale for choosing them has no theoretical base. The vast variety of behavioral ...
متن کامل